Audiovisual spatial recalibration but not integration is shaped by early sensory experience

Summary To clarify the role of sensory experience during early development for adult multisensory learning capabilities, we probed audiovisual spatial processing in human individuals who had been born blind because of dense congenital cataracts (CCs) and who subsequently had received cataract removal surgery, some not before adolescence or adulthood. Their ability to integrate audio-visual input and to recalibrate multisensory spatial representations was compared to normally sighted control participants and individuals with a history of developmental (later onset) cataracts. Results in CC individuals revealed both normal multisensory integration in audiovisual trials (ventriloquism effect) and normal recalibration of unimodal auditory localization following audiovisual discrepant exposure (ventriloquism aftereffect) as observed in the control groups. In addition, only the CC group recalibrated unimodal visual localization after audiovisual exposure. Thus, in parallel to typical multisensory integration and learning, atypical crossmodal mechanisms coexisted in CC individuals, suggesting that multisensory recalibration capabilities are defined during a sensitive period in development.


INTRODUCTION
Sensory experiences made during sensitive periods of development have a strong and lasting impact on how information from the different sensory systems is combined (Putzar et al., 2007;Schorr et al., 2005;Smyre et al., 2021;Sourav et al., 2019;Wallace and Stein, 2007). For example, neuroanatomical and neurophysiological studies in owls have revealed distinct neural mechanisms for crossmodal recalibration depending on the age of the animals: Although crossmodal recalibration in juvenile owls was mediated by anatomical changes (DeBello et al., 2001;Feldman and Knudsen, 1997), it was associated with top-down guided physiological changes in adult animals (Hyde and Knudsen, 2002;Knudsen, 1998).
These different neural mechanisms of crossmodal learning in the developing and adult system were impressively demonstrated in an experiment in which juvenile owls were exposed to a short phase of prism experience, resulting in a shift of perceived auditory locations to correct for the visual spatial disparity, and consecutively grew up with typical visual input. When the prisms were refitted in adulthood these animals demonstrated, unlike adult owls without any prism experience early in life, juvenile-like recalibration of auditory spatial maps. Crucially, this prevailing learning capacity was limited to the same prism-mediated shift of the visual world that was experienced during development (Knudsen, 1998;Linkenhoker et al., 2005). These results, hence, suggest that the involved neural systems were not more plastic in general, but that these animals had access to specific additional anatomical connections which had been established during the juvenile prism experience. Thus, it seems that anatomical connectivity acquired during the sensitive period sets the limits of functional adaptation in adulthood and that the maintenance of these connections is at least partially independent of use following the sensitive period.
Behavioral findings in humans with late-onset blindness provide support for this hypothesis as well. Despite dramatic changes in the accessible sensory world, late blind humans have been shown to use the typical visually defined spatial reference frame for representing touch in tasks in which congenitally blind humans use somatotopically defined (Collignon et al., 2009;Rö der et al., 2004) or head-centered references frames (Rö der et al., 2007), even if blindness had lasted for more years than in the congenitally blind participants of the same study. Similarly, it has been reported that congenitally blind humans lack typical sound-shape associations observed in sighted participants (Fryer et al., 2014;Hamilton-Fletcher et al., 2018;Sourav et al., 2019), whereas late blind humans still demonstrate them many years after total blindness had started (Sourav et al., 2019).
If crossmodal connectivity is indeed acquired during an early sensitive phase of development and stabilized in a way that prevents loss later in life, individuals who were born blind but recovered sight later in life should demonstrate altered multisensory processes reminiscent of those observed in congenitally blind humans. Moreover, multisensory interactions of the newly acquired visual system with the auditory and tactile systems would be expected to differ. The evidence currently available draws an inconsistent picture: Individuals who had been born blind because of bilateral dense cataracts and who had undergone cataract removal surgery later in life demonstrated processing gains from redundant simple crossmodal stimulation (De Heering et al., 2016;Putzar et al., 2007Putzar et al., , 2012 and even showed visual-haptic integration as assessed by the precision of size estimates (Senna et al., 2021) or the occurrence of the size-weight illusion (Pant et al., 2021). However, they were less able to integrate audio-visual speech input (Putzar et al., 2007(Putzar et al., , 2010, did not show typical crossmodal sound-shape associations (Sourav et al., 2019), and used spatial reference frames typical for congenitally blind humans (Azanon et al., 2018;Ley et al., 2013;cf. Collignon et al., 2009;Rö der et al., 2004).
Altered multisensory processing might be related to impaired multisensory learning mechanisms such as crossmodal recalibration. Recent findings have suggested two dissociable types of crossmodal recalibration which emerge at two distinct time scales: (1) Immediate recalibration on a trial-by-trial basis following a single exposure to a discrepant crossmodal stimulus and (2) cumulative recalibration after prolonged exposure to a consistent crossmodal discrepancy (Bruns and Rö der, 2015;Park and Kayser, 2021;Watson et al., 2019). However, whether or how immediate and cumulative multisensory learning mechanisms depend on early sensory input has not been investigated in humans yet. A study in children has recently found that crossmodal spatial recalibration does not arise before middle childhood (with immediate recalibration occurring before cumulative recalibration) and, thus, develops later than the ability to integrate auditory and visual spatial input for localization (Rohlf et al., 2020). Moreover, it has been suggested that the weights assigned to each sensory cue during cumulative recalibration in adulthood do not depend on their precision but rather seem to be fixed (Rohlf et al., 2021;Zaidel et al., 2011;but see Hong et al., 2021), whereas they are dependent on cue reliability in both immediate recalibration and multisensory integration (Rohlf et al., 2021). Thus, the late developmental onset of cumulative crossmodal recalibration might be a consequence of the need to acquire the relative weighting of individual sensory cues through extensive experience which is consecutively stabilized by an elaboration of anatomical connectivity during development (Linkenhoker et al., 2005).
Multisensory integration requires not only a proper weighting of the sensory cues but also a decision of whether the sensory cues arise from a common cause and, thus, should be integrated at all (Kayser and Shams, 2015). Humans seem to solve this causal inference problem implicitly and optimally (or near-optimally) as predicted by Bayesian hierarchical causal inference (CI) models (Kö rding et al., 2007;Wozny et al., 2010), which account for both the sensory cue binding and weighting aspects by assuming that two perceptual estimates are derived, one under the assumption that the sensory cues originated from a single source (fusion) and one under the assumption that they had separate sources (segregation). Previous studies have suggested some variability across individuals in the decision strategy that is used to integrate these two estimates (Wozny et al., 2010): The majority of participants selected one of the two causal structures in proportion to their probability (probability matching), but in some participants the behavioral responses were better described by a weighted averaging of the two estimates (model averaging) or by the selection of the most likely causal structure (model selection). Regardless of the specific individual decision strategy used, the superiority of the CI models in explaining human behavior, as compared to a model that only takes cue reliability but not the causal structure into account (forced fusion models), suggests that multisensory integration typically incorporates both causal inference and cue reliability.
If early crossmodal experience is critical for multisensory spatial integration and recalibration capabilities to emerge, we would expect altered multisensory spatial processing in CC reversal individuals because of the lack of visual (and as a consequence altered crossmodal) input after birth. The present study tested this hypothesis by investigating both multisensory integration and crossmodal recalibration in individuals who had been born blind because of bilateral dense cataracts which were removed between the ages of ll OPEN ACCESS 2 iScience 25, 104439, June 17, 2022 iScience Article 5 months and 33 years. Although congenital bilateral dense cataracts typically do not result in total blindness (i.e., a loss of light perception), the CC reversal individuals selected for the present study fulfilled the criteria of blindness according to the classification of the World Health Organization (2019) before cataractremoval surgery (i.e., visual acuity worse than 3/60).
We used a typical and well-established ventriloquism paradigm with simple sounds and visual stimuli which were presented with a spatial disparity. The participants' task was to localize both the auditory and the visual component of the crossmodal stimuli, which allowed us to assess the ability to spatially integrate audiovisual input after reversal of a congenital blindness. Additional unimodal visual and auditory test trials were included to assess the immediate (VAEi) and cumulative ventriloquism aftereffects (VAEc) indicating crossmodal recalibration (for a schematic illustration of the experimental setup, see Figure 1). We expected a smaller ventriloquism effect (VE), and relatedly a smaller VAEi, because of prevailing visual deficits in CC reversal individuals compared to normally sighted controls. If normative causal inference and optimal multisensory integration require exposure to audiovisual correspondences during a sensitive phase of early development, we would also expect the localization performance of CC reversal individuals to be less well described by a CI model. Moreover, if the relative weighting of individual cues for cumulative crossmodal recalibration is acquired during a sensitive phase of development, we would expect a lower influence of vision on the recalibration of auditory spatial representations but a higher influence of hearing on recalibration of visual spatial representations in CC reversal individuals compared to normally sighted controls and individuals who had been treated for developmental (i.e., late onset) cataracts.

RESULTS
To determine the influence of early visual experience on multisensory integration (as indicated by the VE) and learning (as indicated by the VAE) in later life, we compared a group of 11 individuals who had been born blind because of dense bilateral congenital cataracts (CC group), and who underwent cataractremoval surgeries later in life (between 5 months and 33 years of age), with two control groups, one group of 10 normally sighted control participants (SC group) and one group of 10 individuals with developmental cataracts (DC group). All participants had to localize auditory (A) and visual (V) stimuli coming from four different azimuthal positions. A and V stimuli were either presented together with different audiovisual (AV) spatial discrepancies (adaptation trials used to assess the VE) or were presented alone as unimodal test trials to assess immediate recalibration effects (VAEi), that is, a shift in sound or visual localization after a single exposure to a spatially discrepant AV stimulus. In a second block, the AV spatial discrepancy was fixed at 10 to assess cumulative recalibration effects (VAEc) in the unimodal test trials.
Early visual input is not necessary for the development of audiovisual spatial integration (VE) In AV trials, participants had to localize both the A and the V stimulus components as in previous studies of the VE (Jackson, 1953;Kö rding et al., 2007;Mohl et al., 2020;Wozny et al., 2010;Wozny and Shams, 2011). The degree to which these responses were modulated by the AV spatial discrepancy (i.e., the size of the VE) is, thus, an indicator of multisensory integration. Here we defined the VE as the difference in localization responses between AV trials with a rightward discrepancy (V to the right from location A) and AV trials with a leftward discrepancy (V to the left from location A). The resulting VE values are shown in Figure 2 (for group-averaged responses as a function of AV discrepancy, see also Figure S1).  To further characterize multisensory integration in all groups, we next modeled the spatial distributions of A and V responses as a function of the AV discrepancy (see Figures S2-S5). We fitted and compared three Bayesian hierarchical causal inference (CI) models of multisensory perception, each with a different decision strategy (Wozny et al., 2010): model averaging (CI-MA), probability matching (CI-PM), and model selection (CI-MS), and four non-CI baseline models to each participant's Block 1 AV localization responses (ventriloquism effect). The CI models accounted for uncertainty in the causal structure that could give rise to the A and V cues, that is, the relative probability of the same versus different sources causing the cues was assessed to determine whether and how to integrate the cues (Kö rding et al., 2007). Unlike the CI models, the baseline models assumed a fixed causal structure and either always integrated (forced fusion, FF) or always segregated (SG) the A and V cues to estimate the stimulus locations. The FF and SG models each had two forms: either incorporating a Gaussian location prior (FF, SG) or no priors (FFnp, SGnp). The best model for an individual participant was selected based on having the lowest Bayesian information criterion (BIC) value (Tables 1 and S1). No participant was best fitted by model FFnp or SGnp. A Chi-square test of independence on the numbers of participants best fitted by the other five models indicated that the distribution of best-fitting models did not depend on the group [c 2 (8) = 11.42, p = 0.179].
Overall, model CI-PM predicted the AV localization responses best, as it was the best-fitting model for 17 of 31 (54.8%) participants in the total sample including the majority of CC and SC participants (Table S1), and had the lowest overall BICs for CC and SC participants (Table 1). CI-MS accounted for most of the remaining participants (22.6%) and was the best-fitting model in DC participants, with CI-PM performing only slightly worse than CI-MS in this group (Table 1). Overall, this finding is similar to previous behavioral studies of the VE in normally sighted human participants, which found that CI-PM explained the AV spatial localization data better than CI-MA and CI-MS (Wozny et al., 2010;Wozny and Shams, 2011). Importantly, in all three groups the BIC values indicated that all three CI models described the data substantially better than each of the four non-CI models. The forced fusion model with no priors (FFnp), commonly known as the optimal integration model (Alais and Burr, 2004;Ernst and Banks, 2002), described the data worst and did not improve much even after incorporating the location prior (Table 1). In summary, the observed AV localization pattern of most participants, including the majority of CC participants, was highly consistent with Bayesian causal inference, as evident by the overall superior performance of the CI models compared  iScience Article with the baseline models. These findings suggest that multisensory integration in CC individuals was guided by similar computational principles as in the control groups.

Immediate crossmodal recalibration (VAEi) recovers after restoring sight in congenital blindness
To assess whether participants immediately recalibrated their unimodal localization in response to spatially discrepant AV stimuli on a trial-by-trial basis, we computed the difference in localization responses between unimodal trials (A and V trials, respectively) which were preceded by an AV trial with a rightward discrepancy and unimodal trials which were preceded by an AV trial with a leftward discrepancy. The resulting VAEi values are shown in Figure 3 (for group-averaged responses as a function of the spatial discrepancy in the preceding AV trial, see also Figure S6).
In all three groups, localization responses in A trials were biased in the direction of the preceding AV discrepancy [p%0.018, BF +0 R 5.68], indicating a typical auditory VAEi. The size of the auditory VAEi did not significantly differ between groups [F(2,18.34) = 0.32, p = 0.728; BF 10 = 0.25]. In contrast to the auditory VAEi, localization responses in V trials were not significantly modulated by the preceding AV spatial discrepancy in any of the three groups [p>0.999, BF -0 % 0.41] and there were no significant group differences [F(2,17.35) = 2.10, p = 0.153; BF 10 = 0.54]. Thus, on a trial-by-trial basis, all three groups recalibrated auditory but not visual localization in response to immediately preceding spatially discrepant AV stimuli.
Across groups, the size of the auditory VAEi was significantly correlated with the size of the auditory VE [r s = 0.49, p = 0.005], suggesting a link between multisensory integration and immediate crossmodal recalibration (see Figure 4). The correlation between auditory VE and VAEi was significant within the group of CC individuals [p = 0.012], but not within the DC and SC groups [pR0.494].
Early sensory experience defines the role of vision and hearing for calibrating multisensory spatial representations (VAEc) To assess whether participants recalibrated their unimodal localization in response to cumulative evidence for a crossmodal spatial mismatch, we calculated the difference in unimodal localization responses (A and V trials, respectively) between Block 2, in which AV trials featured a constant spatial discrepancy of 10 , and unimodal localization responses in Block 1, in which AV trials had a mean spatial discrepancy of 0 . The resulting VAEc values (see Figure 5) correspond to changes in the likelihood mean bias parameters (D A , D V ) of our computational models from Block 1 to Block 2 (for tests of changes in other model parameters, see Table S2 and Figures S2-S4).
In the CC and SC groups, localization responses in A trials were significantly biased in the direction of the constant AV spatial discrepancy in Block 2 [p% 0.006, BF +0 R 24.30], indicating a typical auditory VAEc. The G262.58 (7) 1059.65
The duration of blindness may influence the degree to which (but not whether) audiovisual spatial functions recover Because logMAR (logarithm of the Minimum Angle of Resolution) visual acuity varied between 0.21 and 1.29 in CC individuals and between À0.05 and 0.63 in DC individuals and was considerably worse in both groups compared to SC individuals who all had normal or corrected-to-normal vision, we tested whether visual acuity was correlated with the size of the VE, VAEi, and VAEc effects (see Figure S7). None of the effects were significantly correlated with visual acuity, neither in the CC group [p>0.999], nor in the DC group [pR0.396].
We also verified that all participants were able to reliably localize the A and V stimuli. To this end, we computed localization precision in Block 1 as the individual SDs of the localization responses in A and V trials, respectively (see Figure 6). There were no significant group differences in unimodal localization precision between CC, DC, and SC individuals, neither for A precision [F(2,17.81) = 1.14, p = 0.344; BF 10 = 0.48] nor for V precision [F(2,15.07) = 2.90, p = 0.086; BF 10 = 3.00]. In all three groups, V precision was significantly higher than A precision [p% 0.025; BF +0 R 3.35]. This suggests that all participants, even those with severe visual impairments, were well able to localize the A and V stimuli used in the present study.
Finally, in an exploratory analysis we compared the sizes of VE, VAEi, and VAEc effects between those CC individuals (n = 4) who had their cataract removed relatively early in life (between 5 and 25 months of age) and those CC individuals (n = 6) who had cataract surgery only much later in life (at or after 14 years of age). The auditory VE, VAEi, and VAEc tended to be weaker and the visual VE and VAEc tended to be stronger in iScience Article CC individuals who had experienced an extended period of blindness from birth (see Figure S8). However, none of these differences reached statistical significance [pR 0.334; BF 10 % 0.66].

DISCUSSION
Although studies in owls have demonstrated a crucial role of crossmodal experience for adult multisensory learning (recalibration) capabilities, it has remained unclear how early visual experience affects human multisensory learning. In a rare human model, sight-recovery individuals with a (in several cases extensive) history of congenital blindness because of bilateral dense cataracts, the present study found surprisingly normal multisensory spatial integration (as indicated by the VE) and recalibration (as indicated by the auditory VAEi and VAEc) capabilities. However, in addition to using visual spatial information to recalibrate auditory localization (auditory VAEc), as seen in normally sighted controls and individuals who had received cataract removal surgery for incomplete or later developed cataracts, only CC reversal individuals used auditory spatial information to recalibrate visual localization (visual VAEc). Both effects, additional use of auditory input to calibrate visual spatial representations (visual VAEc) and intact multisensory integration as assessed by the VE and computational modeling, were consistently seen across CC individuals who all had received cataract-removal surgery late (i.e., at or after the age of 5 months) but whose individual ages at cataract removal surgery varied considerably up to 33 years. Thus, we speculate that the additional recalibration mechanism observed in CC reversal individuals originates from a neural mechanism existing as a consequence of congenital visual deprivation.
Studies in owls (Brainard and Knudsen, 1998;DeBello et al., 2001;Hyde and Knudsen, 2002) that had manipulated the crossmodal spatial correspondence of auditory and visual input through the use of prisms, found both a reduced elimination of anatomical connections during development and an additional axonal growth that served the atypical mapping induced by prism experience (DeBello et al., 2001). These anatomical changes allowed the prism-reared animals as adults to toggle between two parallel and independent neural organizations, a typical and an atypical one, likely by employing inhibitory mechanisms iScience Article (Hyde and Knudsen, 2002). These findings suggest a more general developmental principle: Experience shapes structural brain networks only during sensitive periods and these neural networks then build the scaffold for future learning. Applied to our human model, in which we compared the effects of presence versus absence of visual input during early development, we speculate that CC individuals had preserved typical multisensory connectivity, allowing them to regain typical multisensory spatial functions, including the VE and the auditory VAEc, after sight restoration. Thus, we suggest that CC individuals were able to ''fall back'' to a typical organization (Brainard and Knudsen, 1998). In addition, we suggest that they had stabilized an additional set of neural connectivity as a consequence of congenital visual deprivation resulting in the later use of auditory spatial information for calibrating visual representations (visual VAEc). Thus, CC individuals might achieve internal consistency between auditory and visual spatial representations by recalibrating both auditory and visual rather than only auditory spatial representations. A similar coexistence of typical and atypical multisensory links has been observed for the interaction of auditory and visual motion processing (Guerreiro et al., 2016) as well. Although a change of auditory motion perception (a motion aftereffect) after observing visual motion stimuli was found in both sighted controls and CC individuals, a visual after-effect following the presentation of auditory motion was only observed in CC individuals. Thus, in both cases the typical multisensory function characterized by a visual dominance recovered in CC individuals, despite the maintenance of an additional atypical multisensory link.
In congenitally blind humans, evidence for a stronger auditory-driven activity of multisensory parietal cortex than in sighted individuals was reported during auditory localization tasks (Gougoux et al., 2005;Renier et al., 2010;Rö der et al., 1999). A stronger auditory influence on multisensory brain regions might either arise from a lack of pruning or a lack of inhibition of exuberant crossmodal connections, as is typically occurring during development (Johannsen and Rö der, 2014; see Lewkowicz and Rö der, 2012), or from an elaboration of additional connections strengthening the auditory influence in multisensory structures (Rauschecker, 1995). Given the lack of visual information, such a strengthening might be useful for an efficient orienting to and localization of external events. After cataract removal, the additional connections serving auditory spatial input during blindness might (as the atypical connectivity in owls, see Brainard and Knudsen, 1998;DeBello et al., 2001) not be lost but rather be kept and used as a second mechanism for achieving crossmodal consistency. Results from a recent analysis of blood-oxygen-level-dependent (BOLD) resting state activity in CC reversal individuals, most of them with long visual deprivation periods, have indeed suggested altered audio-visual system interactions (Raczy et al., in press). Moreover, previous studies in CC reversal individuals had suggested an enhanced auditory influence in visual cortex after sight restoration (Collignon et al., 2015;Guerreiro et al., 2015). However, how these changes in crossmodal interactions relate to the current behavioral findings remains undissolved.
In contrast to cumulative crossmodal recalibration (VAEc), we did not observe any significant differences in multisensory integration (VE) capabilities between CC individuals and normally sighted controls: CC iScience Article individuals featured a significant auditory and a nonsignificant visual VE as did the SC group. A strong visual influence on auditory localization in CC individuals fits with their higher visual compared to their auditory localization precision in the present study, as found in the normally sighted controls. Computational modeling analyses indeed suggested an employment of similar multisensory integration principles in CC individuals as typically observed in the normally sighted population (Kö rding et al., 2007;Wozny et al., 2010) and replicated in the present study in both SC and DC individuals: Causal inference models explained multisensory integration best in all three groups. Consistent with previous studies, there was some variation in the best-fitting CI decision strategy across individuals. Nevertheless, the majority of our participants, including the majority of the CC individuals, were best described by a probability matching strategy (Wozny et al., 2010). Regardless of the variation in individual decision strategies, the superiority of the CI models indicates that the behavioral localization responses were based on a normative causal inference process in all three groups. Thus, in a task that is typically characterized by a strong visual dominance and by a decline of the visual influence over development (Rohlf et al., 2020), we observed extensive recovery in individuals who had experienced a phase of congenital blindness before sight restoration (see also Senna et al., 2021). A similar result pattern emerged for immediate (trial-by-trial) recalibration: Despite an additional visual VAEc in CC individuals, an additional visual VAEi was not detected, whereas a typical auditory VAEi was revealed. In fact, the size of the auditory VAEi and VE was significantly correlated in the CC group as has been previously demonstrated in normally sighted individuals (Rohlf et al., 2021). Thus, the present results provide further evidence for a dissociation of cumulative crossmodal spatial recalibration (VAEc) and both immediate crossmodal recalibration (VAEi) and multisensory integration (VE), suggesting that the latter two processes originate from overlapping neural mechanisms (Park and Kayser, 2019).
The results for the VE contrast with previous reports of an impaired integration of visual and auditory speech signals in CC individuals (Putzar et al., 2007(Putzar et al., , 2010. However, we think there is a crucial difference between audio-visual speech perception and audio-visual localization: In contrast to localization, speech perception is better achieved through the auditory than through the visual (lipreading) channel. Prospective studies have shown that the McGurk effect, that is, the fusion of incongruent auditory and visual speech signals, increases between 3 and 9 years of age (Hirst et al., 2018). Younger children more likely reported the auditory input and needed a relatively more reliable visual stimulus in order to report a fused percept (Hirst et al., 2018). By contrast, the VE was stronger in children than in adults (Rohlf et al., 2020), that is, the VE declines rather than increases with age. Thus, in both cases more adult-like multisensory percepts arise across development from an increased weighting of the less reliable (less dominant) sensory input. This requires an increased weighting of vision in audio-visual speech perception, but an increased weighting of the auditory input for audio-visual localization (and hence a reduction of the visual influence). In CC individuals, the latter is likely already in place but the first can only be developed after sight restoration. Thus, the different developmental time course for audiovisual speech and spatial processes might have favored recovery of multisensory spatial integration despite the remaining and often quite severe post-surgical visual impairments typically seen in CC individuals. Our modeling results suggest that CC individuals weighted auditory and visual spatial information according to their current relative reliability similar to normally sighted individuals. A typical VE in CC individuals does, however, not necessarily mean that the VE iScience Article does not require visual or crossmodal input to emerge, but rather that this visual or crossmodal input is not required during the first phase of life (Senna et al., 2021).
These results are seemingly at odds with studies in dark reared cats in which no multisensory enhancement was found in multisensory neurons of the superior colliculus (Wallace et al., 2004;Yu et al., 2019). Recently, however, multisensory gains in superior colliculus neurons have been observed when the animals were tested in light rather than in the dark (Smyre et al., 2021). We conducted our experiment in a lit room and, moreover, unlike the previously tested dark-reared cats, the CC individuals of the present study had extensive experience with audio-visual stimuli in natural environments after cataract-removal surgery, many for several years. This experience might have allowed for a recovery of the typical visual influence on auditory localization. A prerequisite for such a recovery is that unisensory visual spatial representations recover after sight restoration. In fact, a regular visual topographic organization has been demonstrated in visually deprived owls (Du Lac and Knudsen, 1991) and ferrets (King and Carlile, 1993) that were raised with binocular lid suture, and for primary visual cortex in human CC individuals (Sourav et al., 2018). Moreover, both in owls (Knudsen et al., 1991) and ferrets (King and Carlile, 1993) at least a crude alignment of auditory and visual receptive fields was observed immediately after terminating visual deprivation. It is possible that diffuse residual light perception, which exists even in the presence of bilateral dense cataracts similarly as in lid-sutured animal models (Sherman and Spear, 1982), had contributed to the preservation of the typical crossmodal connectivity in the CC individuals of the present study.

Why did recovery in CC individuals differ between multisensory integration (VE) and recalibration (VAEc)?
There is multiple independent evidence suggesting that multisensory integration (VE) and crossmodal recalibration (VAEc) reflect at least partially distinct mechanisms: Event-related potential studies revealed an early versus late modulation for the VAEc and VE, respectively (Bonath et al., 2007;Bruns et al., 2011;Bruns and Rö der, 2010). Prospective studies in children have recently observed that the VE emerges prior to the VAEc (Rohlf et al., 2020(Rohlf et al., , 2021, thus providing developmental evidence for a dissociation of both effects. Moreover, the VAEc was found to be independent of cue reliability whereas the VE, in accord with an extensive literature (Alais and Burr, 2019), varied with cue reliability even in the youngest (5 years old) children tested (Rohlf et al., 2020(Rohlf et al., , 2021. Zaidel et al. (2011) suggested that crossmodal recalibration uses a fixedratio weighting of the crossmodal cues which is independent of current cue reliability (but see recent results from Hong et al., 2021). Although multisensory integration aims at improving precision and, thus, should take cue reliability into account, crossmodal recalibration might aim at achieving internal consistency (i.e., accuracy) of individual cues which does not depend on reliability (Zaidel et al., 2011).
If crossmodal recalibration indeed uses a fixed-ratio weighting of cues, the question arises how this ratio is set. The result that crossmodal recalibration emerges relatively late (>7 years) in development (Rohlf et al., 2020) suggests an experience-dependence. Recent studies have provided evidence that crossmodal temporal biases with a high intra-individual stability are altered in CC individuals, suggesting that crossmodal settings are indeed acquired and stabilized during a sensitive phase of development (Chen et al., 2017;Badde et al., 2020). Similarly, the fixed-ratio weighting of sensory cues during crossmodal recalibration seems to have a high stability and consistency across humans and other primates (Zaidel et al., 2011) and, thus, might be stabilized based on experience during a sensitive period. This is supported by the high inter-individual consistency with which the visual VAEc emerged in the CC group. However, it should be considered that in the present study a significant group difference in the visual VAEc was only obtained between the CC and SC but not between the CC and DC groups.
In summary, we found multisensory integration to be indistinguishable across groups (no visual VE emerged in the CC group) whereas for cumulative crossmodal calibration we observed an expansion to recalibrating visual localization by auditory input in the CC group. We speculate that the auditory VE, which seems to emerge early in development (Rohlf et al., 2020) and is highly dependent on cue reliability both in children (Rohlf et al., 2020(Rohlf et al., , 2021 and in adults (Alais and Burr, 2004), indicates a multisensory process which has to maintain a high level of plasticity throughout life to be able to instantly adjust to changes in the environment because of the need to maximize localization precision. By contrast, crossmodal recalibration is mainly important for maintaining localization accuracy. After the body has reached a certain level of maturity, expected future adaptations are relatively minor for the majority of the population (similar as for crossmodal temporal biases, Badde et al., 2020). If experience is expected to remain stable in the population after a certain developmental phase, a sensitive period is an efficient evolutionary strategy (Frankenhuis and Walasek, 2020 iScience Article

Limitations of the study
Our results demonstrate that early crossmodal experience is not necessary for the emergence of normal multisensory spatial integration (VE) and crossmodal recalibration (auditory VAEc) capabilities, but that lack of vision during early development results in the establishment of additional recalibration mechanisms (visual VAEc) not seen in sighted individuals or individuals with a later developed visual impairment. Because of the rare patient group tested in our study, our statistical analyses had to be based on a relatively small sample size. However, the typical pattern of multisensory integration and learning (auditory VE, VAEi, and VAEc) was replicated in all three groups of the present study, and the additional visual VAEc observed in the CC group was highly consistent across CC individuals. All these effects robustly emerged despite the use of a dual-task paradigm in which participants had to report both the auditory and the visual location in crossmodal trials and which might, thus, have diminished integration of the crossmodal cues compared to a joint report. Our behavioral findings can, however, not provide direct insights into the possible neural basis for the change in recalibration mechanisms resulting from a lack of early visual experience. Besides, we are not able to precisely delineate the timing of the sensitive period because we had to rely on naturally occurring cases of blindness and sight recovery in our human model. We applied stringent selection criteria for CC individuals based on medical records to make total and dense cataract at birth likely. Nevertheless, some uncertainty about the precise morphology of the cataractous lens inevitably remains in patients who presented for surgery long after birth. In addition, although the age of cataract onset is known in CC individuals (congenital), it is not clearly defined in DC individuals because of the typically gradual onset of developmental cataracts. However, because age at surgery varied considerably but results were quite consistent across CC individuals (see Figure S8), our findings point to a crucial role of the first year in human brain development in accord with neuroanatomical studies (Gilmore et al., 2020).

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:  iScience Article iScience Article the DC group, t(18.78) = 4.44, p<.001. All CC and DC individuals reported normal hearing and had no known neurological disorders. Cataract-reversal individuals in the CC and DC groups were compared to a group of 10 normally sighted control (SC) individuals (aged 15-30 years, M = 19.1, SD = 4.8, three female and seven male) who had a typical development of all sensory systems. All 10 SC individuals had normal or corrected-to-normal vision according to their self-report. Additional visual acuity data were available for seven SC individuals who had a mean logMAR acuity of -0.17 (range: À-0.24 to À0.10). The age range of the SC group (15-30 years) largely overlapped with the age ranges of the CC and DC groups, except for the youngest participants in the DC group (minimal age of 9 years) and the oldest participants in the CC group (maximal age of 41 years). However, previous studies had shown that the experimental effects of interest, the VE and VAE, were adult-like in 8-9 year-old children (Rohlf et al., 2020) and did not differ between younger and older adults (Stawicki et al., 2019) or only between younger adults and adults aged above 60 years (Park et al., 2021).
All CC, DC, and SC individuals were recruited at the LV Prasad Eye Institute in Hyderabad, India. Participants or, for minors, their legal guardians provided written informed consent prior to taking part in the study. Expenses associated with taking part in the study, such as travel costs, were reimbursed and minors received a small gift. The study was approved by the Local Ethics Committee of the Faculty of Psychology and Human Movement at the University of Hamburg, Germany, as well as by the Institutional Review Board at the LV Prasad Eye Institute in Hyderabad, India, and was performed in accordance with the ethical standards laid down in the Declaration of Helsinki.

General procedure
The experimental setup resembled those used in typical studies of the VE and VAE (Frissen et al., 2003;Jackson, 1953;Kö rding et al., 2007;Park andKayser, 2019, 2021;Rohe and Noppeney, 2015;Wozny et al., 2010;Wozny and Shams, 2011;Zierul et al., 2017). Participants faced the center of four loudspeakers (Companion 2, Bose Corporation, Framingham, MA, USA) which were placed at a distance of 45 cm with eccentricities of G5 and G15 from the participants' straight-ahead position (0 ). The loudspeakers were covered by an acoustically transparent cloth. Four pairs of red LEDs were attached on top of the loudspeaker array, one pair at the center of each loudspeaker location. LED pairs rather than single LEDs were used to increase the saliency of the visual stimulation because we were concerned that residual unimodal visual localization impairments in CC individuals (and possibly in DC individuals) would have otherwise masked any genuine group differences in multisensory processing (Badde et al., 2020). For visual stimulation, LEDs were illuminated for 35 ms. Auditory stimuli were white noise bursts with a duration of 35 ms. Audiovisual stimuli were always presented synchronously. Participants indicated perceived stimulus locations with an array of 12 pushbuttons which was placed directly in front of the loudspeakers and which had the same width as the loudspeaker array. A schematic illustration of the experimental setup is shown in Figure 1.
The experiment consisted of two blocks of 240 trials each. Block 1 was designed to measure VE and VAEi, and Block 2 was designed to measure VAEc. Each block lasted approximately 15 min and included 40 unimodal auditory (A) trials, 40 unimodal visual (V) trials, and 160 audiovisual (AV) trials. After each trial, participants indicated the perceived location of the A and/or V stimulus by pressing the corresponding button in front of the loudspeaker/LED array. On AV trials, two responses were required, one for the A and one for the V stimulus. Half of the participants reported A first and V second throughout the experiment, and vice versa for the other half of the participants. The task was self-paced and accuracy was stressed over response speed. The next trial was presented between 900 and 1400 ms (randomly determined) after the (second) response.

Block 1 (VE and VAEi)
A and V trials were equally distributed across the four locations (10 trials per location each). In AV trials, each possible combination of the four A and V locations occurred 10 times, resulting in AV spatial discrepancies of G30 (10 trials each), G20 (20 trials each), G10 (30 trials each), or 0 (40 trials). A, V, and AV trials were presented in a pseudorandom order, with the constraint that A and V trials were equally often preceded (disregarding any interjacent unimodal trials) by AV trials with G20 , G10 , and 0 spatial discrepancy (8 trials for each preceding AV discrepancy). AV trials with G30 discrepancy were never immediately followed by a unimodal trial. A and V trials were equally distributed across the four locations as in Block 1. However, in AV trials the V stimulus was always 10 to the right of the A stimulus. Thus, only three combinations of A and V locations were presented (A:-15 , V:-5 , 53 trials; A:-5 , V: 5 , 54 trials; A: 5 , V: 15 , 53 trials). A, V, and AV trials were presented in random order.

VE, VAEi, and VAEc
As in previous studies using categorical localization responses (Bruns et al., 2011;Bruns and Rö der, 2010;Zierul et al., 2017), individual responses were coded as 1-12 (corresponding to the 12 pushbuttons from left to right) and then averaged across the 10 unimodal trials (A and V separately) per location from Block 1. These baseline localization scores were then subtracted from localization responses in each trial of Block 1 and 2 to determine signed localization errors (negative indicates leftward and positive indicates rightward localization bias). Localization errors were converted from response units to degrees for visualization purposes.
Ventriloquism effects (VE) were calculated as difference in localization errors between AV trials with a rightward discrepancy (V to the right from A location) and AV trials with a leftward discrepancy (V to the left from A location) in Block 1. Immediate ventriloquism aftereffects (VAEi) were calculated as difference in localization errors between unimodal trials which were preceded by an AV trial with a rightward discrepancy and unimodal trials which were preceded by an AV trial with a leftward discrepancy in Block 1. Cumulative ventriloquism aftereffects (VAEc) were calculated as mean localization errors in unimodal trials of Block 2, which indicate the difference in localization responses between Block 1 (average AV discrepancy of 0 ) and Block 2 (constant AV discrepancy of 10 ). VE, VAEi, and VAEc were each calculated separately for A and V responses. Correlations between VE, VAEi, and VAEc effects as well as correlations between effects and logMAR visual acuity were calculated as Spearman's rank correlations.
The size of the auditory and visual VE, VAEi, and VAEc was compared between groups by means of Welch ANOVAs and Games-Howell post-hoc comparisons. In addition, one-tailed, one-sample t tests were used to test whether effects were significantly larger than zero (Holm-corrected for multiple comparisons). All tests were additionally performed as Bayesian hypothesis tests using standard priors in JASP Version 0.14 (Wagenmakers et al., 2018) and Bayes Factors (BF 10 for two-tailed tests and BF +0 /BF -0 for one-tailed tests) are reported to indicate the evidential value for the null or alternative hypothesis, respectively.

Model descriptions
We conducted modeling analysis to probe the computational components of the multisensory integration and learning processes in CC, DC, and SC individuals. Audiovisual integration is well described by Bayesian hierarchical causal inference models (Kayser and Shams, 2015;Kö rding et al., 2007;Wozny and Shams, 2011). The general idea is that cue integration improves perceptual precision and accuracy only when the cues provide redundant information about the same object or event, otherwise the cues should be processed separately. Therefore, the perceptual system needs to perform causal inference to determine which cues originated from the same cause and should be integrated. The causal inference (CI) models estimate how likely the auditory and visual cues share a common cause and arbitrate between integrating and segregating the cues accordingly. Not only do these models provide normative benchmarks against which behavioral VE data can be compared, they also allow quantitative assessment of any changes associated with the VAE, which are reflected in the changes of model parameters (Wozny and Shams, 2011).
We considered three different CI models, as well as four baseline models, for the audiovisual localization responses of each participant. All the models assume that the auditory (A) and visual (V) stimuli at the locations (s A , s V ) give rise to noisy internal sensations (x A , x V ), and that (x A , x V ) vary from trial to trial even for the same (s A , s V ). The models have access only to the noisy (x A , x V ) and must estimate (s A , s V ). Below we describe the important components of the CI models and the baseline models.
The Causal Inference (CI) Models. The CI models infer the unknown causal structure (C) of the stimuli in order to estimate the true locations (s A , s V ). A and V may have a common cause (C = 1) or independent causes (C = 2). If A and V are caused by the same object (C = 1), then the optimal approach to estimate ll OPEN ACCESS iScience 25, 104439, June 17, 2022 iScience Article the one true location (s = s A = s V ) is to integrate the model's A and V location estimates ( b S A ; b S V ) according to the relative reliabilities of the A and V sensory cues. This approach is often referred to as optimal integration (Alais and Burr, 2004;Rohde et al., 2016) or forced fusion (De Winkel et al., 2015). If A and V are caused by independent objects (C = 2), then the true A and V locations should be separately estimated, based solely on each unimodal sensory cue. The CI models are probabilistic models that represent information uncertainty in terms of probabilistic distributions (Li et al., 2020). Each CI model takes the perspective of a Bayesian ideal observer (Geisler, 2011), which combines sensory evidence and prior belief to calculate the probability of the underlying causal structure C, according to the Bayes' rule: Here, C could be 1 (common cause) or 2 (independent causes), and pðCjx A ; x V Þ is the posterior probability (''posterior'') of C given the sensory data (x A , x V ). The prior probability (''prior'') for the causal structure is denoted as p(C). Specifically, the prior for a common cause, p(C = 1), is known as the causal prior (Kö rding et al., 2007) and hereafter denoted as Pc for short. Pc represents the prior belief about how likely the cues share a common cause. Because the models only consider two possible causal structures, the prior for independent causes, p(C = 2), simply equals 1 -Pc. The likelihood function (''likelihood''), pðx A ;x V jCÞ, calculates the joint probability of the sensory data (x A , x V ) given a certain causal structure C. The likelihood quantifies the probabilistic mapping between the physical environment and the internal sensory representations.
For a common cause (C = 1), the posterior probability in Equation (1) is Like the priors, the posteriors sum to one; therefore, the posterior for C = 2, pðC = 2jx A ;x V Þ, simply equals 1 -pðC = 1jx A ; x V Þ. The joint likelihood of getting the sensory data (x A , x V ) under C = 1 is calculated by integration: where s = s A = s V . The location prior, pðsÞ, is the prior probability distribution over the possible stimulus locations; it represents the expectation of where the stimulus is likely to occur. Given a stimulus at location s, pðx A jsÞ and pðx V jsÞ are the unimodal likelihoods of sensing x A and x V , respectively. Assuming Gaussian distributions for pðsÞ, pðx A jsÞ, and pðx V jsÞ, Equation (3) can be analytically solved as Here, m P and s P are the mean and standard deviation of the Gaussian location prior pðsÞ, and s A and s V are the standard deviations of the Gaussian noises that have corrupted the unisensory A and V signals, respectively.
By contrast, if the cues have independent causes (C = 2), then the joint likelihood of observing the sensory data (x A , x V ) is calculated by Minimizing this cost function under C = 1 will give the analytical solution b s A; This is the optimal estimate if C = 1. Equation (8) is a more general form of the optimal integration model (Alais and Burr, 2004;Ernst and Banks, 2002). If the models do not account for the location prior (which is equivalent to using a uniform location prior: m P = 0, s P /N), then Equation (8) reduces to the optimal integration model, which weights the sensory data by their relative reliabilities, and the reliability is represented by the inverse variance 1/s 2 A or 1/s 2 V .
Under C = 2, minimizing the cost function in Equation (7) will give the analytical solutions for the A and V estimates separately as: These are the optimal estimates if C = 2, which are equivalent to estimating the A and V locations independently, based solely on the respective unisensory data.
The three CI models we considered are identical in applying Equations (1, 2, 3, 4, 5, 6, 7, 8, 9a, and 9b) but differ in how they combine the estimates obtained under C = 1 (integration) and C = 2 (segregation) to generate the responses for A and V localization. The optimal strategy to combine the estimates is model averaging (MA), because it results in the minimal mean expected squared error (Kö rding et al., 2007). The CI model with the MA strategy (CI-MA) calculates the weighted averages of the estimates under C = 1 and C = 2; the corresponding weights are the posterior probabilities of C = 1 and C = 2: In addition to the MA strategy, behavioral studies on causal inference have shown that humans and nonhuman primates sometimes apply two heuristic alternatives for audiovisual localization ( where x is sampled from a uniform distribution ½0; 1 on each trial. For example, if the posterior probability of C = 1 is 0.6 on a trial, then 60% of the time the CI-PM model will choose b S A;C = 1 as its final estimate for the A location, but 40% of the time it will choose b S A; C = 2 as its final estimate for the A location (Wozny et al., 2010). The CI model with the MS strategy (CI-MS) compares the posterior probabilities of C = 1 and C = 2 and simply uses the estimates from the more probable causal structure, that is, the causal structure whose posterior probability is >0.5: For example, if the posterior probability of C = 1 is 0.6 on a trial, then the CI-MS model will choose b S A;C = 1 as its final estimate for the A location (Wozny et al., 2010).
Baseline Models. In addition to the three causal inference models (CI-MA, CI-PM, and CI-MS), we considered four baseline models. The baseline models do not perform causal inference; instead, they assume a fixed causal structure and always either fuse or segregate the A and V cues to estimate the stimulus locations.
Two of the baseline models take into account the location prior: (1) A forced fusion (FF) model with a Gaussian location prior. This model always integrates A and V sensory data based on their reliabilities according to Equation (8). The FF model assumes a common cause and is equivalent to the C = 1 branch of a CI model. (2) A segregation (SG) model with a Gaussian location prior. This model completely segregates A and V sensory data without any integration or inter-modal influence. The A and V locations are estimated independently, based solely on the information within each modality according to Equations (9a) and (9b). The SG model assumes independent causes and is equivalent to the C = 2 branch of a CI model.
To check whether the prior parameters were necessary for modeling our data, we additionally tested the simplified versions of the above two baseline models without the location prior: (3) A forced fusion model with no priors (FFnp), and (4) a segregation model with no priors (SGnp). Mathematically, having no priors is equivalent to assuming a flat location prior (m P = 0, s P /N), which means these models have no expectations for the stimulus location, and a fixed causal prior (Pc = 1 for FFnp, Pc = 0 for SGnp). The FFnp model is identical to the optimal integration model (Alais and Burr, 2004;Ernst and Banks, 2002) and applies a simplified version of Equation (8)

Model Parameters and Simulations
Causal inference models. Each CI model has seven parameters: D A , s A -the bias and the standard deviation (SD) of the auditory Gaussian likelihood; D V , s V -the bias and the SD of the visual Gaussian likelihood; m P ; s P -the mean and the SD of the Gaussian location prior for stimuli of either modality; Pc -the causal prior, that is, the prior for a common cause.
Typical CI models assume unbiased likelihoods and priors (e.g., Kö rding et al., 2007;Wozny et al., 2010); that is, they assume that the likelihoods center at the true stimulus locations (D A = 0, D V = 0) and that the location prior centers at 0 azimuth angle (m P = 0). Here we incorporate non-zero bias terms to probe whether the VAEc is associated with changes in the likelihood or prior biases -in other words, whether a shift in the likelihood or location prior distribution is a possible computational mechanism underlying the VAEc (for detail, see Wozny and Shams, 2011).
For each combination of the seven parameters and each combination of true audiovisual (AV) stimulus locations (s A , s V ), each CI model performed 10,000 Monte Carlo simulated AV localization trials. The noisy internal sensory representations x A and x V were generated according to the Gaussian likelihood functions pðx A jsÞ and pðx V jsÞ: On each simulated trial, x A and x V were independently sampled from two Gaussian distributions representing the independent Gaussian noises that corrupted the A and V signals. The Gaussian distributions centered at the true stimulus locations s A and s V plus the likelihood biases D A and D V , respectively, with s A and s V determining the strengths of the Gaussian noises: x A $ Nðs A + D A ; s A Þ and x V $ N ðs V + D V ; s V Þ. Additionally, the model accounted for a prior bias for the stimulus location, ll OPEN ACCESS iScience 25, 104439, June 17, 2022 iScience Article